Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(optimizer): Fix qualify for SEMI/ANTI joins #4622

Merged
merged 3 commits into from
Jan 16, 2025

Conversation

VaggelisD
Copy link
Collaborator

@VaggelisD VaggelisD commented Jan 15, 2025

Fixes TobikoData/sqlmesh#3557

The RHS table of an SEMI/ANTI join shouldn't be collected as a selected source (in contrast to normal joins) since it's only used for the filtering process under the hood.

Note that as a side-effect of this, the _expand_using rule for SEMI/ANTI joins is turned off since we'd now fail to resolve the RHS related columns if they're expanded.

PS: I believe a different solution would be to run transforms.py::eliminate_semi_and_anti_joins before the qualification step

@georgesittas
Copy link
Collaborator

We still want to transform the USING clause into ON, in order to canonicalize those queries. Isn't the only problem here that we're including columns qualified with "r" in the expanded star?

From the linked issue, this is what we produce:

SELECT
  COALESCE("l"."id", "r"."id") AS "id",
  "l"."val_l" AS "val_l",
  "r"."val_r" AS "val_r"
FROM "l" AS "l"
SEMI JOIN "r" AS "r"
  ON "l"."id" = "r"."id"

But as Bill pointed out, if you produce this instead, it works (verified it locally):

SELECT
  "l"."id" AS "id",
  "l"."val_l" AS "val_l"
FROM "l" AS "l"
SEMI JOIN "r" AS "r"
  ON "l"."id" = "r"."id"

That means you need to treat SEMI / ANTI joins specially in the star expansion logic to avoid introducing "r".<column> projections. Notice that having "r"."id" in the join condition doesn't cause any issues.

@VaggelisD
Copy link
Collaborator Author

VaggelisD commented Jan 16, 2025

I unblocked the USING expansion by adding a new map on Scope which collects the SEMI/ANTI join tables as "pseudo" selected sources in order to bypass validate_qualify_columns.

sqlglot/optimizer/scope.py Outdated Show resolved Hide resolved
sqlglot/optimizer/scope.py Outdated Show resolved Hide resolved
@georgesittas georgesittas merged commit f7628ad into main Jan 16, 2025
7 checks passed
@georgesittas georgesittas deleted the vaggelisd/ddb_semi_anti_join_qualify branch January 16, 2025 19:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Columns from SEMI and ANTI joins are handled like "normal" joins (DuckDB)
3 participants